Eecient Real-time Index Updates in Text Retrieval Systems
نویسندگان
چکیده
As information retrieval (IR) systems emerge as the mainstream information nding tool within commercial enterprises due to the enormous popularity of World Wide Web (WWW) technology in the intranet environments, the ability to incorporate new and/or updated documents into the database in real time becomes an essential requirement. However, conventional IR systems are optimized for read queries by employing aggressive indexing and/or query result caching. The cost of maintaining the consistency between the indexes and the underlying database is expensive but assumed to be tolerable since update processing is performed in a batch fashion. This paper describes the design and implementation of a real-time index/cache consistency maintenance technique in Codir, a text retrieval system that features an integrated compression and and indexing scheme. The main idea behind Codir's real-time index consistency maintenance technique is to build transient index for new document updates, and to process read queries using both permanent and transient index. To minimize the performance overhead associated with document database updates, Codir integrates transient index with permanent index lazily either by piggybacking the integration task with read query processing or by periodic batch processing. Finally, unlike previous IR systems, Codir supports document deletion, based on a lazy invalidation approach.
منابع مشابه
Eecient Transaction Management & Query Processing in Massive Digital Databases
We address several important issues that arise in the development of Massive Digital Database Systems (MDDSs) in which data is being added continuously and on which users pose queries on the y. News-on-demand and document retrieval systems are examples of systems that have these characteristics. Given the size of data, metadata such as index structures become even more important in these system...
متن کاملCompression: A Key for Next-Generation Text Retrieval Systems
In this article we discuss recent methods for compressing the text and the index of text retrieval systems. By compressing both the complete text and the index, the total amount of space is less than half the size of the original text alone. Most surprisingly, the time required to build the index and also to answer a query is much less than if the index and text had not been compressed. This is...
متن کاملDESIGN AND IMPLEMENTATION OF FUZZY EXPERT SYSTEM FOR REAL ESTATE RECOMMENDATION
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; backgro...
متن کاملDESIGN AND IMPLEMENTATION OF FUZZY EXPERT SYSTEM FOR REAL ESTATE RECOMMENDATION
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; backgro...
متن کاملSIREn: Entity Retrieval System for the Web of Data
We present ongoing work on the Semantic Information Retrieval Engine (SIREn), an “entity retrieval system” specifically designed to meet the requirements of indexing and searching a large amount of semi-structured data, e.g. the entire Web of Data. SIREn supports efficient full text search with semi-structural queries and exhibits a concise index, constant time updates and inherits Information ...
متن کامل